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ABSTRACT 


Neural  networks  are  appropriate  for  meteorological  classification  tasks  for  a  number  of 
reasons.  First,  their  associative  properties  allow  graceful  degradation  of  performance  under 
conditions  of  ambiguity  and  noise,  thus  avoiding  the  brittle  behavior  of  many  standard 
approaches.  Second,  they  learn  to  perform  tasks  which  cannot  easily  be  specified 
analytically,  such  as  non-linear  discriminate  functions.  Finally,  they  can  be  executed  in  real¬ 
time  on  appropriate  hardware.  To  exploit  these  properties,  this  research  developed  a  general 
approach  to  meteorological  classification  based  on  neural  network  data  fusion.  The  system 
was  applied  to  cloud  type  identification  from  satellite  imagery.  The  current  experiment  is 
one  of  the  first  to  provide  a  large  cloud  database  on  which  to  train,  and  as  such  is  one  of  the 
first  true  cross-validation  experiments  in  this  area.  While  the  27  days  of  data  provides  many 
pixel  samples  of  the  cloud  types  present  at  a  particular  hour,  the  question  to  be  answered  here 
was  whether  the  samples  collected  on  particular  types  of  clouds  sufficiently  represent  the 
variations  of  that  cloud  that  can  appear  on  a  different  day.  The  promising  results  point  to  the 
applicability  of  neural  networks  for  automated  generation  of  meteorological  products  in  real¬ 
time. 
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SECTION  1 


INTRODUCTION 


Neural  networks  are  appropriate  for  meteorological  classification  tasks  for  a  number  of 
reasons.  First,  their  associative  properties  allow  graceful  degradation  of  performance  under 
conditions  of  ambiguity  and  noise,  thus  avoiding  the  brittle  behavior  of  many  standard 
approaches.  Second,  they  learn  to  perform  tasks  which  cannot  easily  be  specified 
analytically.  This  allows  improved  performance  in  perception  tasks  and  cost-effective 
retargeting  of  systems  to  additional  domains.  Finally,  they  can  be  executed  in  real-time  on 
appropriate  hardware.  To  exploit  these  properties,  this  research  developed  a  general 
approach  to  meteorological  classification  based  on  neural  network  data  fusion.  The  system 
was  applied  to  cloud  type  identification  from  satellite  imagery.  Howevei ,  the  system  could 
easily  be  retrained  to  perform  a  range  of  other  meteorological  identification  tasks  such  as  the 
identification  of  hurricanes,  thunderstorm  outflow  boundaries,  etc. 

A  number  of  promising  preliminary  results  for  the  method  have  been  shown  during  the 
previous  work  [9,1 1],  including  a  demonstration  of  accurate  classification  performance  on  a 
limited  dataset,  graceful  degradation  of  classification  performance  over  large  shifts  of  terrain, 
and  fusion  of  ground  sensor  data  with  meteorological  imagery.  Those  initial  experiments 
were  jackknife  tests  performed  on  very  small  data  sets.  Jackknife  tests  cull  their  separate  test 
data  from  the  same  source  that  provides  the  training  data.  Since  cloud  formations  often  span 
large  distances,  it  is  probably  the  case  that  the  sample  distributions  of  both  training  and  test 
sets  were  similar.  Most  prior  cloud  typing  experiments  have  been  of  similar  small  size  and 
suffer  from  the  same  problems. 

The  current  experiment  is  one  of  the  first  to  provide  a  large  database  on  which  to  train, 
and  as  such  is  one  of  the  first  true  cross-validation  experiments  in  this  area.  The  current 
work  applied  and  refined  the  original  techniques  to  the  large  database.  While  the  27  days  of 
data  provides  many  pixel  samples  of  the  cloud  types  present  at  a  particular  hour,  the  question 
to  be  answered  here  was  whether  the  samples  collected  on  particular  types  of  clouds 
sufficiently  represent  the  variations  of  that  cloud  type  that  can  appear  on  a  different  day.  The 
promising  results  point  to  the  applicability  of  neural  networks  for  automated  generation  of 
meteorological  products  in  real-time. 

The  system  architecture  is  shown  in  figure  1 .  Heterogeneous  sensor  streams  including 
point  sensor  data  and/or  image  data  are  fed  into  the  system.  A  vision  system  utilizing  a 
number  of  biologically  plausible  theories  produces  a  range  of  non-local  products  which 
augment  the  local  training  data  for  the  neural  network  classifier  stage.  Point  sensor  data  can 
be  optionally  extrapolated  in  two  dimensions  to  match  image  data  [11].  Supervised  learning 
is  used  to  train  the  classifiers.  A  large  meteorological  database  provides  the  target  signal  for 
training.  A  knowledge -based  system  controls  the  performance  elements  of  the  classification 
system.  The  simplest  implementation  would  be  a  look-up  table  indexed  by  time  of  day  and 
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season.  To  allow  the  inclusion  of  meteorological  heuristics,  the  control  component  may 
incorporate  expert  system  technologies.  Neural  network-based  control  is  also  a  possibility. 
Classification  performance  is  measured  by  cross  validation  tests  using  untrained  human 
classifications. 


Cross-Channel 

Classification 


Figure  1 .  System  Architecture 

Feature  vectors  for  classification  are  constructed  by  appending  image  data  with  derived 
products  generated  via  a  number  of  preprocessing  steps.  The  derived  products  provide 
information  of  a  non-local  nature  which  is  used  as  part  of  a  classification  which  operates  on 
local  regions  (i.e.,  pixels).  Vision  algorithms  applied  to  the  image  data  provide  texture  and 
morphology  features. 

The  classifier  uses  a  multiple  module  feedforward  neural  network.  It  implements  a 
number  of  non-linear  discriminate  functions  which  are  specifically  tailored  to  the 
meteorological  task  to  provide  high  accuracy  and  good  generalization  to  untrained  data. 


U  RELATION  TO  PREVIOUS  WORK 

Garand  [3],  produced  an  extensive  study  which  applied  an  extensive  range  of  analytical 
image  processing  measures  and  a  Multivariate  Gaussian  Discriminate  function.  Twenty 
cloud  classes  were  used,  including  a  number  of  special  classes  (e.g..  Cloud  Streets,  Rolls. 
Polygonal  Open  Cells,  and  Strongly  Convective  Open  Cells)  and  clear.  Garand's  study  used 
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GOES-EAST  data  collected  at  1600  UTC  for  29  days  during  February  1984.  The  test  set  was 
comprised  of  1800  U't  C  data  for  January  8  and  12,  1984  and  February  7,  8,  and  12,  1984. 
Only  results  for  .,«*a  are  reported.  The  method  processed  and  reported  results  for  128  x  128 
km  regions.  An  overall  accuracy  of  79  percent  was  reported. 

The  current  study  used  a  subset  of  Garand’s  classification  scheme  and  also  used  GOES- 
F  AST  data,  but  both  land  and  sea  data  were  included.  A  non-linear  neural  network 
discrimination  function  was  used.  No  special  estimates  of  physical  parameters  were  used  as 
features;  complex  discrimination  surfaces  are  learned  from  the  raw  data,  wavelet,  and 
morphology  features.  Our  method  reports  results  for  each  pixel  instead  of  for  a  large  region, 
allowing  more  detailed  cloud  classification  maps.  The  current  study  used  test  data  from 
strictly  unseen  days  whereas  3/5  of  the  Garand  test  data  was  from  days  that  had  been 
included  in  the  training  sample  with  two  hours  of  difference  in  the  samples. 

Lee,  et  al.  [8],  produced  a  neural  network  study  which  demonstrated  high  mean 
accuracy  using  three  cloud  classes.  The  study  used  only  Visible  imagery  and  used  an 
algorithmic  texture  feature  called  the  Gray  Level  Difference  Vector  (GLDV).  The  standard 
backpropagation  training  algorithm  was  used.  Their  cumulus  class  ranged  from  sn.all  fair 
weather  cumulus  to  mesoscale  sized  cumulus.  Their  cirrus  class  included  cirrus,  cirrostratus, 
cirrocumulus,  and  contrails.  Their  stratocumulus  class  ranged  from  solid  decks  to  breakup 
regions.  The  study  used  LANDSAT  MSS  imagery  with  a  spatial  resolution  of  57  m  per 
pixel.  Each  image  covers  185  km  by  170  km.  A  mean  accuracy  of  94  percent  was  reported. 

The  current  study  also  uses  a  neural  network  classifier.  A  subtractive  learning 
algorithm  specialized  for  generalization  performance  was  used.  Data  from  unseen  days  was 
used  for  testing.  The  high  resolution  LANDSAT  MSS  data  provides  more  accurate  texture 
information  than  the  low  resolution  GOES  data  used  for  our  study.  The  current  study  used 
eight  cloud  classes  as  opposed  to  three. 

Bankert,  et  al.  [1],  produced  a  study  which  used  the  Probabilistic  Neural  Network 
(PNN)  to  classify  clouds  in  Advanced  Very  High  Resolution  Radiometer  (AVHRR)  imagery. 
Ten  cloud  classes  were  used:  Cirrus,  Cirrocumulus,  Cirrostratus,  Altostratus,  Nimbostratus, 
Stratocumulus,  Stratus,  Cumulus,  Cumulonimbus,  and  Clear.  All  samples  were  obtained 
from  a  total  of  four  512  x  512  pixel  images,  two  of  which  occur  on  the  same  day  and  time  at 
locations  separated  by  1 4'  latitude  and  5'  longitude.  The  feature  vector  contained  203 
components,  including  the  GLDV  texture  measure  and  a  number  of  physical  measures  from 
Garand's  study. 

The  current  study  uses  a  subtractive  learning  algorithm  which  constructs  a  compact 
representation  from  an  unlimited  number  of  samples.  Techniques  such  as  these  can  utilize 
large  numbers  of  samples  with  low  memory  and  low  execution  computational  requirements. 
The  PNN  classifier  constructs  its  representation  of  the  sample  distribution  by  retaining  each 
sample  in  the  training  set  in  memory.  The  PNN  approach  limits  the  number  of  samples  that 
can  be  used  for  training  and  thus  effects  the  ultimate  scalability  of  the  technique.  The 


3 


AVHRR  resolution  should  provide  improved  texture  discrimination  capability  over  the 
GOES  resolution  used  here. 


1.2  EXPERIMENTAL  GOALS 

The  major  goal  of  the  experiment  was  to  test  the  generalization  capabilities  of  the 
neural  network  classification  system  on  completely  independent  untrained  data.  Another 
goal  was  to  test  the  effectiveness  of  various  algorithmic  improvements.  A  final  goal  was  to 
test  various  strategies  for  scaling  the  techniques  to  operational  status. 


1.3  EXPERIMENTAL  PROCEDURE 
1.3.1  Day  Cloud  Database 

GOES-EAST  satellite  imagery  was  provided  by  the  Phillips  Laboratory,  Geophysics 
Directorate.  The  data  was  collected  from  1430  to  1900  GMT  half  hourly  during  June  and 
July  of  1991.  The  visible  (0.55-0.75  pm)  channel  and  (1 1  pm  window)  infrared  channel 
were  used.  Images  of  size  512x512  pixels  at  1  km  resolution  were  used.  Each  image 
contained  New  England  and  the  adjacent  Atlantic  ocean.  The  following  thirteen  cloud 
classes  were  used  to  classify  the  imagery: 

1.  Small  Scattered  Cumulus 

2.  Cumulus 

3.  Thin  Cirrus 

4.  Cirrus 

5.  Thin  Cirrus  over  cloud 

6.  Stratus 

7.  Stratocumulus 

8.  Altocumulus 

9.  Altostratus 

10.  Cumulonimbus 

11.  Clear 

12.  Haze 

13.  Fog 

A  computer  method  for  manually  classifying  the  image  data  sets  and  storing  files  of  the 
results  was  devised.  Classifications  were  selected  while  viewing  visible,  infrared,  or  a 
visible/infrared  composite  image.  A  mouse  was  used  to  sweep  squares  of  a  user  selectable 
size  (typically  1 2  pixels)  across  the  image  to  mark  cloud  samples.  Each  sample  had  an 
associated  color  which  encoded  the  cloud  type.  A  menu  bar  along  the  edge  of  the  display 
was  used  to  select  the  desired  cloud  type.  Mistakes  were  easily  erased  and  corrected  either 
during  the  initial  session  or  later  in  a  resumed  session.  The  result  of  the  process  is  a  new' 
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image  file  called  a  "pick"  file  which  contains  the  classification  information  in  the  selected 
pixel  locations.  As  the  analysts  could  not  easily  cover  only  the  cloud  cover  for  small 
cumulus  clouds,  a  postprocessing  step  was  run  to  eliminate  cases  where  the  clear  space 
between  cumulus  clouds  had  been  classified  as  cumulus. 

Two  consoles  were  used  side  by  side.  Two  Phillips  Laboratory  Geophysics  Directorate 
analysts  manned  the  consoles  and  worked  together  to  produce  consensus  classifications. 

Both  analysts  could  see  both  displays.  The  screen  on  the  left  showed  the  current  session. 

The  screen  on  the  right  showed  the  previous  half  hour  imagery  with  its  Pick  image 
superimposed.  The  right  console  was  used  to  make  modifications  and  additions  as  needed. 
Further  details  on  the  database  are  provided  in  [6]. 

The  cloud  database  consists  of  ten  samples  per  day  taken  at  half  hour  intervals.  Due  to 
collection  problems,  a  maximum  of  27  samples  is  available  for  some  hours  and  as  few  as  22 
are  available  for  other  hours.  To  maximally  utilize  the  data,  a  series  of  leave-one-out  cross 
validation  tests  were  run.  In  these  26  days  of  data  were  used  for  training  and  an  unseen  27th 
day  was  used  for  testing.  These  were  run  at  a  single  time  1700  across  the  days.  Because 
certain  cloud  types  had  very  few  samples,  they  were  left  out  of  these  experiments  because 
generalization  performance  would  be  degraded  by  their  inclusion.  All  examples  of  these 
cloud  types  were  removed  from  the  training  and  test  sets.  The  resulting  eight  cloud  classes 
and  numbering  scheme  used  for  all  results  reported  in  the  following  is: 

1 .  Small  Scattered  Cumulus 

2.  Cumulus 

3.  Thin  Cirrus 

4.  Cirrus 

5.  Thin  Cirrus  over  cloud 

6.  Stratocumulus 

7.  Altocumulus 

8.  Clear 

Figure  2a  shows  a  typical  visible  image  of  the  sample  region.  Figure  2b  shows  the 
associated  infrared  image  There  is  about  60  percent  land  and  40  percent  water  for  the  New 
England  area  used. 
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SECTION  2 


PREPROCESSING  STEPS 


A  comb  filter  was  designed  and  implemented  to  remove  scan  lines  from  the  original 
visual  channel  imagery.  Feature  vectors  for  classification  were  constructed  by  appending  the 
Visible  and  IR  local  pixel  data  with  derived  products  generated  via  a  number  of 
preprocessing  steps.  The  derived  products  provide  information  of  a  non-local  nature  which  is 
then  used  as  part  of  the  input  to  a  classifier  which  operates  on  local  regions  (i.e.,  pixels).  We 
developed  a  FFT  version  of  the  vision  software  that  runs  on  Mercury  vector  processors  which 
plug  into  Sun  Sparcstation  hosts.  This  step  was  necessary  because  of  the  large  size 
(512x512)  and  large  number  of  images  we  had  to  process  for  this  experiment. 


2.1  TEXTURE 

The  2D  Gabor  wavelet  transform  introduced  by  Daugman  [2]  is  an  efficient  conjoint 
Spatial/Spectral  2D  information  encoding.  The  2D  Gabor  forms  a  non-orthogonal  basis1 
which  can  be  used  for  image  coding.  In  addition,  it  has  been  shown  to  be  useful  for  texture 
segmentation  of  imagery.  Its  characteristics  model  observed  behavior  of  simple  cells  in 
mammalian  optical  cortex.  Equation  one  specifies  the  general  functional  form  of  the  2D 
Gabor  family  in  terms  of  the  space-domain  impulse  response  function  G(x,y): 

G(x,y)=e'*((x-Xo)2a2  +  (y-y^b*)  x  e'2™(u0(x-Xo)  +  v0(y-y0))  (1) 

where  (x,y)  are  position  parameters  and  (u,v)  are  modulation  parameters.  A  family  of  self- 
similar  2D  Gabor  wavelets  were  used  for  spatial  frequency  analyses  which  are  combined  by 
the  feed-forward  classifier  network  to  form  texture  detectors.  In  the  original  experiments 
[9,1 1],  six  spatial  frequencies  spaced  by  half  octaves  were  used  for  the  Visible  channel  and 
four  spatial  frequencies  spaced  by  half  octaves  were  used  for  the  IR  channel.  Each  spatial 
frequency  was  represented  by  a  quadrature  phase  pair  in  six  orientations.  The  locality 
preserving  nature  of  the  2D  Gabor  has  been  particularly  useful  for  increased  accuracy  in 
texture  detection. 

2.1.1  Generalized  Gabor  Representation 

Small  sample  sets  might  not  have  examples  of  each  cloud  type  at  all  orientations.  To 
eliminate  the  possible  detrimental  effects  on  classifier  generalization  performance,  a 
compressed  representation  which  eliminates  orientation  specificity  was  designed. 


1  Gabors  can  form  a  quasi-orthogonal  basis  with  appropriate  spacing. 
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The  data  in  the  infrared  channel  has  a  very  low  frequency  component  which  was  judged 
to  be  of  marginal  utility  to  the  classification  of  the  current  cloud  types.  The  lowest  frequency 
data  in  the  visible  image  was  similarly  ignored.  Thus,  six  orientations  of  Gabor  wavelets  at 
four  spatial  scales  were  used  for  the  visible  image  and  no  Gabor  data  was  used  for  the 
infrared  image.  To  generalize  the  remaining  data,  the  six  oriented  wavelet  responses  at  each 
spatial  scale  were  replaced  by  two  values.  The  first  is  the  root  mean  square  value  of  the 
quadrature  pair  envelopes  of  all  the  oriented  responses  at  a  pixel  location.  This  provides 
information  as  to  the  extent  that  clouds  have  texture  at  this  spatial  scale.  The  second 
representation  indicates  the  degree  to  which  the  oriented  response  was  localized  or  not.  In 
this  representation,  the  rms  value  is  scaled  by  a  variable  which  takes  a  value  between  +1  and 
-1,  where  +1  occurs  if  many  striation  angles  were  detected  and  - 1  occurs  where  only  one 
striation  angle  was  detected.  Thus  the  reduced  representation  indicates  the  degree  to  which  a 
cloud  sample  has  a  specific  striation  at  some  orientation  angle,  but  it  eliminates  the  particular 
angle(s)  from  the  data.  Examples  of  the  reduced  representation  for  one  spatial  scale  can  be 
seen  in  figures  3  and  4.  Eliminating  the  irrelevant  angle  information  greatly  improved 
generalization  performance. 
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2.2  MORPHOLOGY 

The  Boundary  Contour  System  (BCS)  and  Feature  Contour  System  (FCS)  models  J 4,5), 
are  based  primarily  on  psychophysical  data.  The  BCS/FCS  combination  explains  a  large 
body  of  psychophysical  data,  and  the  elements  of  the  model  correspond  closely  to 
neurophysiological  data  about  the  visual  cortex.  Efficient  versions  of  the  BCS  and  FCS  have 
been  implemented  for  the  purpose  of  reliably  determining  coherent  regions  in  images 
corresponding  to  meteorological  phenomena.  The  BCS  determines  boundaries  and  the  FCS 
constructs  regions  from  them.  The  regions  then  provide  morphology  information  to  the 
classifier. 


2.2.1  Boundary  Contour  System  (BCS)  Implementation 

On-center  Off-surround  Processing:  The  first  stage  of  processing  used  a  convolving 
filter  constructed  of  a  difference  of  two  Gaussian  filters.  Such  a  filter  closely  approximates 
the  spatial  second  derivative  operation.  The  product  of  this  filtering  serves  as  the  input  to 
both  the  BCS  and  FCS  systems. 


Oriented  Edge  Detection:  In  our  simulations  we  constructed  convolving  filters  from 
the  difference  of  two  Gaussians  of  equal  dimensions  but  with  offset  centers.  We  used  six 
oriented  filters:  separation  of  the  imagery  into  distinct  orientation  channels  allows 
orientation-specific  processing  to  be  performed  on  each  channel  separately  before  the 
information  is  then  recombined  into  a  multi-orientation  representation.  This  is  an  essential 
aspect  of  the  algorithm  which  allows  for  certain  powerful  operations  which  could  not  be 
performed  on  the  image  as  a  whole. 

First  Competitive  Stage:  An  edge  enhancement  operation  is  performed  on  each 
orientation  plane. 


Second  Competitive  Stage:  The  next  step  in  the  model  is  a  second  competitive  stage 
wherein  competition  among  all  orientation  channels  occurs  at  each  spatial  location. 


Oriented  Cooperation:  A  cooperative  operation  is  performed  in  each  orientation 
channel  to  complete  broken  or  incomplete  edges.  An  oriented  two-armed  filter  is  convolved 
with  each  orientation  plane,  and  a  conjunction  operation  between  the  two  arms  produces  a 
response  in  the  filter  only  while  it  is  straddled  between  aligned  points.  The  family  of 
cooperative  filters  is  described  by  the  following  equation: 


,(r) 


-2 


:  ±  e 


J 


2  2 
x  +y 


— 1 


cos(|atan(7)|  -  r) 
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where  F^r)x,y  is  the  filter  value  at  x,y  for  orientation  r,  r  =  0.5,  and  P  =  9. 
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Feedback  Loop:  The  images  generated  via  the  oriented  filtering  are  recombined  to 
close  a  large  feedback  loop.  This  feedback  allows  the  results  of  the  oriented  cooperation  to 
contribute  to  the  oriented  competition  of  the  second  competitive  stage  to  perhaps  shift  the 
emphasis  between  oriented  responses  where  appropriate. 

2.2.2  Feature  Contour  System  (FCS)  Implementation 

The  region  filling-in  function  of  the  Feature  Contour  System  (FCS)  begins  with  the 
same  on-center  off-surround  image  used  by  the  BCS.  The  FCS  allows  color  to  diffuse  freely 
in  all  directions  within  a  region  until  it  reaches  strong  boundaries  developed  by  the  BCS  and 
exhibits  properties  observed  in  psychophysical  data.  In  our  system,  the  FCS  is  run  at  a 
number  of  spatial  scales,  providing  morphology  information  to  the  classification  system.  The 
multi-spatial  scale  results  for  the  image  of  figure  2  appear  in  figures  5-8. 

Morphology  is  represented  by  sampling  each  of  the  spatial  scales  at  a  pixel  location. 
Small  clouds  in  relative  isolation  will  only  activate  the  highest  spatial  frequency  FCS  image. 
Small  clouds  that  are  part  of  larger  cloud  masses  will  activate  FCS  images  at  lower  spatial 
frequencies  as  well.  Large  mixed  cloud  masses  will  activate  all  of  the  FCS  scales.  Large 
smooth  cloud  masses  will  activate  only  the  largest  scale,  etc. 

We  ran  a  number  of  experiments  that  were  identical  except  that  some  had  the  BCS 
morphology  information  and  some  did  not.  We  found  that  the  inclusion  of  the  morphology 
information  added  a  nine  percent  average  performance  improvement. 
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SECTION  3 


EXPERIMENTAL  PREPARATION 


3.1  LAND/SEA  MAPS 

The  infrared  information  is  considerably  different  for  land  and  ocean  samples. 
Consequently,  separation  of  the  data  into  two  separate  tasks  made  sense  to  ease  the  learning 
task.  Two  neural  networks  were  trained  for  each  experiment:  one  for  land  and  one  for  sea. 
We  ran  a  number  of  experiments  that  showed  improved  learning  and  generalization 
performance  using  the  separate  data.  In  an  eventual  implementation,  this  approach  could  be 
extended  to  a  range  of  different  terrain  types,  e.g.,  snow,  desert,  etc.  Because  the  satellite 
was  moving  east  at  the  time  of  data  collection,  the  best  scenario  would  require  a  map  that 
showed  which  image  pixels  were  over  land  or  sea  for  each  day.  The  Phillips  Laboratory, 
Geophysics  Directorate  provided  MITRE  with  three  map  overlays  of  the  northeast  United 
States  spaced  evenly  over  the  recording  period.  From  these  we  generated  three  "land/sea" 
maps  which  indicate  the  type  of  background  (land  or  sea)  under  each  pixel  for  a  third  of  the 
training  days.  To  compensate  for  the  uncertainty  due  to  the  satellite  drift,  an  "undefined” 
oand  nugging  ail  land  masses  was  introduced,  and  the  samples  there  were  thrown  out.  This 
turned  out  to  not  be  a  problem  because  the  database  has  so  many  pixel  samples  per  day,  for 
most  cloud  types,  that  the  numbers  of  deleted  data  points  were  insignificant.  In  an  ideal 
implementation,  the  land/sea  information  would  be  available  for  each  specified  time  period 
and  no  undefined  regions  would  be  required.  The  ’and/sea  map  for  June  10,  1991  at  1700 
GMT  appears  in  figure  9. 


3.2  FEATURE  VECTOR  GENERATION 

Feature  vectors  for  training  and  testing  used  the  format  shown  in  column  one  of  table  1. 
This  format  incorporated  the  Generalized  Gabor  representation  described  in  section  2.1.1. 
Each  feature  vector  represents  data  at  one  of  the  pixel  locations  identified  as  a  particular 
cloud  type  by  the  meteorologists.  Each  data  type  is  represented  by  a  separate  image  plane. 
For  each  pixel  location,  there  is  a  corresponding  feature  value  for  each  of  the  data  types  in 
column  one.  To  normalize  the  data,  the  mean  and  standard  deviation  of  each  data  type  were 
computed  using  all  identified  samples  in  the  database  across  all  27  available  days  at  1700 
GMT.  Data  for  land  and  sea  were  isolated  using  the  land/sea  maps  and  two  sets  of  statistics 
were  calculated  as  shown  in  table  1 .  Separate  normalized  land  and  sea  training/testing  sets 
were  generated  by  subtracting  out  the  appropriate  mean  and  dividing  by  the  appropriate 
standard  deviation  for  each  data  type  at  a  subset  of  identified  pixel  locations. 

The  available  number  of  samples  across  the  27  days  was  too  large  to  include  all  of 
them.  Moreover,  the  distributions  of  cloud  types  varied  widely  over  the  different  days.  In  an 
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Figure  9.  Land/Sea  map  for  June  10,  1991  at  1700  GMT 

so  much  in  this  27  day  set,  it  was  difficult  to  achieve  a  perfectly  even  distribution.  For 
example,  when  one  cloud  type  appeared  on  only  two  days  and  another  appeared  on  23  days, 
it  did  not  make  sense  to  over  sample  the  two  days  to  match  the  23  day  sample  because  no 
new  information  would  be  added  while  enormous  amounts  of  memory  and  training  cycles 
would  be  consumed.  We  compromised  by  using  even  distributions  among  the  present  cloud 
types  within  a  given  day.  The  training  set  derived  by  concatenating  the  training  sets  from 
many  days  results  in  an  uneven  distribution.  With  a  larger  number  of  days,  it  would  be 
possible  to  obtain  a  training  set  which  removes  the  biases  towards  cloud  types  that  occurred 
more  frequently.  For  this  experiment,  we  used  200  random  samples  of  each  cloud  type  from 
each  day.  If  there  were  less  than  200  samples  present,  then  the  samples  present  were 
oversampled  to  obtain  200  vectors.  The  number  of  cloud  types  present  at  a  given  hour 
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typically  varied  from  one  to  five.  Typical  training  set  sizes  were  on  the  order  of  14000 
vectors.  Typical  testing  set  sizes  were  on  the  order  of  800  vectors. 

Table  1.  Normalization  Statistics 


Land 

Sea 

Data 

Type 

Mean 

Standard 

Deviation 

Mean 

Standard 

Deviation 

Visible 

1 10.083 

40.184 

73.4685 

43.7181 

IR 

98.0063 

34.3616 

104.209 

31.5344 

Gabor  1  RMS 

353.883 

479.91 

178.765 

968.264 

Gabor  1  Ori 

104.394 

164.669 

25.5906 

289.525 

Gabor  2  RMS 

173.32 

241.084 

82.0621 

616.953 

Gabor  2  Ori 

49.6255 

82.418 

30.0943 

374.383 

Gabor  3  RMS 

86.8992 

112.554 

37.963 

329.347 

Gabor  3  Ori 

13.6657 

61.7071 

16.589 

215.911 

Gabor  4  RMS 

39.912 

41.7574 

16.7504 

143.935 

Gabor  4  Ori 

0.564122 

25.1688 

7.6826 

107.389 

Morphology  1 

3.92518 

19.9179 

0.133793 

4.66539 

Morphology  2 

4.58013 

21.6316 

0.422905 

6.87784 

Morphology  3 

369.515 

985.301 

115.875 

583.83 

Morphology  4 

376.915 

988.732 

162.078 

1000.77 

3.3  CLASSIFIER  TRAINING  METHOD 

For  this  experiment,  layered  feedforward  neural  networks  with  sigmoid  non-linearities 
were  trained  to  perform  the  cloud  classification  task.  Initial  architecture  estimates  for  a  few 
days  were  determined  using  a  layered  constructive  learning  algorithm  [10]  developed  at 
MITRE  which  only  added  hidden  units  that  improved  generalization  performance.  It  was 
found  that  the  computational  requirements  for  the  constructive  algorithm  were  prohibitive  for 
the  large  experiment  because  of  the  overhead  required  to  search  for  node  candidates  that 
improved  generalization.  Instead,  we  used  the  configuration  of  nodes  and  connectivity  (but 
not  the  weight  values)  of  the  best  performing  of  these  architectures  as  the  starting  estimate 
for  all  of  the  experimental  trials  using  a  subtractive  learning  algorithm.  This  type  of  learning 
algorithm  deletes  connections  and  nodes  to  further  tune  the  architecture  and  train  the  weights 
to  the  individual  learning  tasks.  The  standard  starting  network  architecture  had  14  input 
units,  eight  output  units,  eight  fully  connected  hidden  units  in  a  first  layer,  and  four  fully 
connected  hidden  units  in  a  second  layer.  Each  hidden  unit  was  connected  to  all  of  the  output 
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units.  The  subtractive  training  algorithm  used  was  a  modified  backpropagation  learning  rule 
described  in  the  following. 


As  this  experiment's  goal  was  to  explore  the  cross-validation  capability  of  neural 
network  classifiers,  we  sought  to  maximize  the  generalization  performance  of  the  classifier. 
Vapnik's  theory  of  the  VC  Dimension  [12]  of  a  function  approximator  considers  the 
maximum  number  H  of  vectors  that  can  be  shattered  in  an  N-dimensional  space.  The  theory 
states  that  as  the  number  of  training  samples  M  gets  small  relative  to  H,  the  probability  of 
generalization  error  goes  up.  To  improve  the  generalization  performance  of  the  classifier 
function  we  are  attempting  to  approximate,  we  try  to  reduce  H  (i.e.,  we  reduce  the  number 
and  complexity  of  classification  boundaries  to  avoid  fitting  to  noise)  and/or  increase  M,  if 
possible. 


The  first  step  we  took  to  reduce  the  VC  dimension  was  to  reduce  the  number  of  weights 
in  our  networks.  We  implemented  Weigand,  Rumelhart's,  and  Huberman's  Weight 
Elimination  method  [9],  which  drives  less  important  weights  to  small  values  by  adding  a 
penalty  term  to  the  backpropagation  learning  cost  function.  The  resulting  cost  function  is 
equation  three. 
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This  cost  function  trades  off  a  weight’s  contribution  to  reducing  the  sum  squared  error 
with  the  magnitude  of  the  weight.  A  weight  that  is  large  and  not  contributing  enough  to  the 
reduction  of  error  is  diminished.  We  used  a  value  of  1 .0  for  wo.  Our  implementation 
periodically  pruned  small  weights  from  the  network.  If  nodes  became  completely 
disconnected  they  were  removed  from  the  network.  Removing  the  less  important  weights 
reduces  the  ability  of  the  network  to  tune  to  noise  in  the  training  set  and  thus  improves 
generalization  performance.  Removal  of  130  or  more  weights  was  typical. 


The  next  step  to  reduce  the  VC  dimension  was  to  reduce  the  dimensions  of  input 
vectors.  In  the  cloud  task,  a  reduced  Gabor  wavelet  representation  was  implemented  which 
shrunk  the  size  of  the  vector  from  66  to  14  real  values.  The  final  step  we  took  did  not  reduce 
the  VC  dimension  H,  but  rather  improved  generalization  performance  by  increasing  the 
number  of  training  samples  M  by  a  factor  of  eight  (see  Feature  Vector  Generalization, 
above).  These  steps  improved  generalization  performance  on  the  cloud  task  by  20-30 
percent,  depending  on  the  training  set. 


To  improve  the  training  performance,  we  implemented  the  Delta-Bar-Delta  learning 
update  rule  [7]  developed  by  Jacobs  and  Sutton.  The  new  rule  uses  a  separate  learning 
constant  for  each  weight  in  the  network  and  keeps  an  exponentially  decaying  history  of  the 
error  attributed  to  each  weight.  Thus  the  rate  at  which  each  weight  is  updated  is  based  on  a 
local  estimate  of  its  performance  instead  of  the  performance  of  the  network  as  a  whole. 
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Keeping  statistics  for  ten  trials  on  a  simple  problem,  we  found  the  average  number  of  training 
epochs  decreased  from  659  to  49  (a  13-fold  improvement)  using  the  Delta- Bar- Delta  learning 
rule. 
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SECTION  4 


EXPERIMENTAL  RESULTS 


4.1  SINGLE  TIME,  MULTIPLE  DAY  EXPERIMENTS 

We  ran.  54  leave-one-day-out  experiments  to  test  the  generalization  properties  of  the 
neural  network  method  described  above.  For  each  of  the  27  days  of  data  available  at  1700 
GMT,  training  sets  were  constructed  by  concatenating  all  land  or  sea  samples  available  on 
the  remaining  26  days  at  1700  GMT.  The  samples  available  for  each  day  at  1700  comprised 
the  testing  set  for  that  day.  Two  feedforward  neural  networks  were  trained  for  each  day,  one 
for  land  and  one  for  sea.  The  training  set  and  testing  set  results  appear  in  table  2.  All  of  the 
neural  networks  for  this  experiment  were  trained  in  parallel  on  separate  Sun  workstations 
using  the  MITRE  Batch  distributed  computing  system.  Confusion  Matrices  for  these  runs 
appear  in  appendix  I  of  this  report2. 


4.2  ANALYSIS 

The  neural  networks  used  in  this  experiment  were  trained  using  stochastic  learning 
algorithms.  As  such,  some  of  the  networks  achieved  better  results,  either  because  the  initial 
random  weight  settings  of  the  network  were  well  suited  to  the  target  function  or  because  the 
learning  had  avoided  falling  into  local  minima.  Results  also  vary  because  of  the  synoptic 
conditions  present  each  day.  In  an  ideal  setting,  each  of  the  experiments  would  have  been 
run  many  times  and  the  results  averaged  to  reduce  the  variance  due  to  neural  network 
training.  That  could  not  be  done  because  of  the  large  computational  requirements  for  thi.> 
experiment.  Instead,  the  mean  results  of  the  single  run  experiments  over  the  27  different 
days  are  provided  in  table  2.  These  results  are  influenced  by  both  training  effects  and  actual 
performance  on  variable  data  conditions.  Thus,  an  optimized  training  situation  should  be 
able  to  improve  on  the  mean  testing  performance  reported  here. 

The  neural  network  training  simulatoi  updated  the  results  whenever  the  testing  set 
performance  improved.  This  explains  why  the  training  set  performance  varies  so  much  from 
day  to  day  even  though  only  1  /26th  of  the  training  set  differs  between  any  two  days.  For  test 
sets  such  as  Sea  6/07/91  1700  GMT,  only  one  cloud  type  (clear)  was  present,  and  hence  the 
network  could  attain  excellent  testing  set  performance  very  early  in  the  training,  when  the 
training  set  performance  was  only  50.39  percent  correct. 


2  The  numbering  scheme  is  zero-based  for  all  confusion  matrices  presented  in  the 
appendices  of  this  report.  Thus,  cloud  type  1  corresponds  to  goal  0  and  output  0, 
cloudtype  2  corresponds  to  goal  1  and  output  1,  etc. 
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Table  2.  Single  Time,  Multiple  Day  Results  (%  Correct) 


1991 

Land 

Sea 

Date 

■QBIlSISflf 

■nasmrfjMi 

6/03 

81.08 

63.10 

78.55 

70.00 

6/04 

84.13 

79.00 

91.73 

89.33 

6/05 

85.00 

66.87 

93.00 

75.10 

6/06 

85.75 

81.87 

85.28 

82.75 

6/07 

57.35 

86.00 

50.39 

100.00 

6/10 

79.27 

90.92 

83.89 

93.80 

6/11 

78.49 

64.60 

77.67 

78.00 

6/12 

82.31 

77.60 

81.73 

92.83 

6/13 

69.37 

57.12 

83.90 

84.17 

6/14 

73.85 

76.66 

78.04 

87.00 

6/17 

81.67 

63.10 

63.56 

84.25 

6/19 

67.57 

86.33 

89.63 

89.75 

6/20 

63.49 

92.00 

48.56 

100.00 

6/21 

82.13 

68.50 

85.89 

94.83 

6/24 

86.55 

90.50 

47.33 

100.00 

6/25 

53.19 

74.00 

50.63 

100.00 

6/26 

84.58 

76.08 

76.38 

72.33 

6/27 

66.60 

71.16 

80.73 

66.50 

7/01 

64.44 

56.37 

82.26 

70.50 

7/08 

83.25 

86.25 

76.86 

73.83 

7/13 

86.75 

74.00 

86.69 

64.87 

7/14 

79.30 

94.25 

78.59 

95.25 

7/15 

56.48 

84.17 

47.58 

100.00 

7/16 

59.96 

90.00 

63.77 

85.33 

7/17 

61.91 

75.50 

49.14 

100.00 

7/18 

82.90 

98.87 

n/a 

n/a 

7/19 

64.82 

77.50 

17.41 

51.25 

Mean 

74.15 

77.86 

71.12 

84.68 

The  correspondence  of  samples  in  the  training  set  to  those  in  the  test  sets  must  be 
carefully  noted  in  assessing  experimental  result:,.  !n  some  cases,  cloud  types  appearing  on 
the  test  day  did  not  occur  during  the  other  26  days  at  that  particular  time.  In  other  cases,  the 
cloud  type  may  have  appeared  during  the  prior  26  days,  but  the  Synoptic  activity  might  be 
sufficiently  different  that  the  examples  of  a  single  cloud  type  differ  greatly.  The  confusion 
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matrix  allows  us  to  determine  exactly  how  cloud  types  are  misclassified.  It  also  allows  us  to 
determine  the  cloud  types  present  in  the  training  and  test  sets. 

Classification  probabilities  were  computed  using  the  27  testing  set  confusion  matrices 
produced  in  the  1700  GMT  experiment3  for  both  land  and  sea.  Each  value  is  computed  by 
summing  the  number  in  the  same  matrix  position  for  each  experiment's  testing  set  confusion 
matrix  and  dividing  that  sum  by  the  total  number  of  samples  that  occurred  for  that  cloud  type 
(i.e.,  in  that  row).  These  results  appear  in  tables  3  and  4.  Because  many  cloud  types  occur  on 
only  a  few  days  in  the  data  set,  it  is  likely  that  outliers  are  having  a  great  effect  on  the 
numbers  that  appear  in  these  tables.  As  indicated  above,  some  of  the  networks  will  have 
become  stuck  in  local  minima  and  so  those  results  would  not  be  indicative  of  the  method’s 
potential  performance.  Given  that  some  cloud  types  only  occur  on  a  few  days,  one  bad 
network  could  throw  off  the  probabilities  considerably.  Thus,  given  the  small  number  of 
sample  days,  these  numbers  should  not  be  considered  as  absolute  probabilities.  Rather  they 
should  be  seen  as  indicating  where  the  confusions  are  likely  to  occur  with  this  method  along 
with  a  weak  probability  estimate.  The  reader  can  examine  the  individual  matrices  in 
appendix  I  to  determine  the  relative  influence  of  the  individual  experiments  on  these 
numbers. 

For  land.  Small  Scattered  Cumulus  (cl.  1)  was  confused  most  often  with  Cumulus 
(cl.  2)  and  less  so  with  Stratocumulus  (cl.  6)  and  Clear  (cl.  8).  It  is  quite  possible  that  clear 
samples  that  occur  between  small  scattered  cumulus  clouds  have  the  same  Gabor  responses 
as  Small  Scattered  Cumulus  due  to  sampling  error  introduced  by  the  size  of  the  wavelets.  It 
may  be  possible  to  avoid  this  by  introducing  a  new  class  called  clear-between-small-cumulus. 
For  Cumulus  (cl.  2)  there  is  confusion  between  Small  Scattered  Cumulus  (cl.  1 )  in  the  land 
case  and  Stratocumulus  (cl.  6)  for  land  and  sea.  Thin  Cirrus  (cl.  3)  was  most  often  confused 
with  Clear  (cl.  8)  for  sea.  This  may  be  due  to  the  fact  that  the  thinnest  cirrus  clouds  and  clear 
actually  appear  quite  similar  in  the  data  representation  used  here:  the  Gabor  wavelet 
responses  may  be  minimal  and  enough  of  the  IR  value  may  be  due  to  the  background  to  make 
the  samples  similar.  Thin  Cirrus  over  cloud  (cl.  5)  was  most  often  confused  with  Cumulus 
(cl.  2),  but  was  also  confused  with  Cirrus  (cl.  4)  and  Stratocumulus  (cl.  6).  Here,  the  cloud 
that  is  in  the  background  may  be  giving  a  stronger  response  than  the  thin  cirrus  clouds  above. 
Stratocumulus  (cl.  6)  was  most  often  confused  with  Cumulus  (cl.  2).  The  technique  had  the 
most  trouble  discriminating  between  Altocumulus  (cl.  7)  and  Stratocumulus  (cl.  6).  While 
this  affect  is  present  in  the  sea  results,  it  is  far  less  pronounced  and  some  aspect  of  the 
background  or  climate  may  be  making  these  classes  look  more  similar  over  land.  The 
technique  was  able  to  classify  clear  well  on  both  land  and  sea. 


3  For  the  sea  table,  the  results  for  7/19  at  1700  GMT  were  left  out.  The  network  appears  to 
have  fallen  into  a  poor  local  minima  and  is  thus  considered  an  outlier  for  this  study. 
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Table  3.  Land  Cross-Validation  Classification  Probabilities  for 
27  Day  1700  GMT  Experiment 


Machine  Labeling 

Human  Labeling 

n 

2 

3 

D 

5 

6 

D 

8 

Small  Scattered  Cumulus:  1 

.73 

.15 

.00 

.00 

.00 

.06 

.00 

.06 

Cumulus:  2 

.06 

.85 

.00 

.00 

.01 

.06 

.01 

.00 

Thin  Cirrus:  3 

.06 

.00 

.56 

.02 

.01 

.00 

.00 

.35 

Cirrus:  4 

.01 

.00 

.06 

.75 

.14 

.01 

.02 

.01 

Thin  Cirrus  over  cloud:  5 

.02 

.13 

.02 

.09 

.58 

.09 

.03 

.03 

Stratocumulus:  6 

.00 

.18 

.00 

.71 

.07 

.00 

Altocumulus:  7 

.05 

.08 

.00 

.03 

.00 

.51 

.33 

.01 

Clear:  8 

.00 

.00 

.01 

.00 

.00 

.00 

.00 

.99 

Table  4.  Sea  Cross-validation  Classification  Probabilities  for 
27  Day  1700  GMT  Experiment 


Machine  Labeling 

Human  Labeling 

1 

2 

3 

n 

5 

6 

a 

8 

Small  Scattered  Cumulus:  1 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

.00 

Cumulus:  2 

.00 

.77 

.01 

.00 

.00 

.13 

.09 

.00 

Thin  Cirrus:  3 

.00 

.00 

.65 

.17 

.03 

.00 

.00 

.15 

Cirrus:  4 

.00 

.00 

.14 

.80 

.04 

.00 

.01 

.00 

Thin  Cirrus  over  cloud:  5 

.00 

.00 

.04 

.02 

.81 

.10 

.03 

.00 

Stratocumulus:  6 

.00 

.14 

.00 

.01 

.01 

.76 

.09 

.00 

Altocumulus:  7 

.00 

.00 

.00 

.05 

.02 

.12 

.80 

.01 

Clear:  8 

.00 

.00 

.01 

.00 

.00 

.00 

.00 

.98 

The  results  for  land  are  poorer  and  the  range  of  confusions  more  diverse  than  for  sea. 
The  difficulty  with  land  appears  to  be  consistent  with  other  cloud  studies  [3]  and  stems  from 
the  fact  that  the  cloud  systems  over  land  can  be  influenced  more  by  local  geography  (e.g., 
lakes,  rivers,  mountains)  and  are  thus  less  homogenous  than  those  over  the  sea.  In  addition, 
the  background  IR  information  will  also  tend  to  change  with  geography.  There  are  a  number 
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of  possibilities  for  improving  the  land  results.  One  approach  would  be  to  subdivide  the  land 
bodies  into  a  particular  type  of  terrain  with  separate  training  sets.  Thus,  as  we  now  have  a 
land  and  sea  network,  a  coastal-land,  mountain-land,  and  desert-land  network  might  allow 
improved  performance.  Another  approach  would  collect  more  training  samples  for  land 
areas  to  capture  more  variability.  Yet  another  approach  would  group  land  training  samples 
on  a  monthly  basis. 

For  all  of  the  results,  there  is  the  potential  for  effects  due  to  human  judgment  error  in 
the  training  data.  Sampling  error  due  to  the  overlap  of  wavelet  kernels  at  transition  regions  is 
a  significant  factor  that  could  possibly  be  improved  through  the  use  of  a  different 
complement  of  wavelets.  The  relatively  low  resolution  of  the  GOES  data  ( 1  KM/pixel)  may 
be  the  cause  of  poor  discrimination  performance  on  certain  cloud  types  having  fine  detail. 
Other  cloud  studies  using  better  resolution  (e.g..  Advanced  Very  High  Resolution 
Radiometer)  imagery  may  achieve  better  overall  results  because  certain  cloud  types  appear 
similar  at  1  KM  resolution.  The  technique  used  here  can  readily  be  extended  to  imagery  with 
higher  resolution. 


4.3  MULTIPLE  TIME  EXPERIMENTS 

A  key  question  to  answer  about  the  classification  techniques  used  in  this  study  concerns 
the  amount  of  training  data  that  would  be  required  to  scale  up  to  operational  use.  Because  the 
method  is  texture  and  IR  temperature  based  it  was  expected  that  changes  in  cloud  shadows 
during  the  day  would  effect  the  classification  accuracy.  Initial  trials  showed  a  performance 
fall  off  with  test  sets  that  differed  greatly  in  time.  The  next  set  of  experiments  seeks  to 
estimate  the  range  of  time  that  can  be  accurately  covered  by  a  classifier  trained  at  a  given 
time,  and  thus  indicate  how  many  classifiers  would  need  to  be  trained  to  cover  a  full  day. 

The  training  set  for  6/21/91  at  1700  from  section  4.1  was  used  to  train  a  classifier. 

Thus  the  training  set  was  constructed  by  concatenating  all  land  or  sea  samples  available  on 
the  remaining  26  days  at  1700  GMT.  The  classifier  was  then  tested  using  data  from  6/21/92 
at  1430, 1500,  1600, 1730,  and  1830  GMT.  A  more  complete  study  would  do  a  similar  set  of 
runs  for  each  of  the  27  days  in  the  data  set  and  average  the  results  at  each  half  hour,  but 
resources  did  not  allow  that  in  this  study.  Hence  these  results  should  be  considered  as  a 
result  based  on  an  extremely  small  sample  which  may  be  biased.  Confusion  Matrices  for 
these  runs  appear  in  Appendix  I  of  this  report.  The  results  are  tabulated  in  table  5. 


Table  5.  Multiple  Time  Results  (%  Correct) 


6/21/91 

Land 

Sea 

Time  (GMT) 

1430 

44.25 

56.75 

44.31 

100.0 

1500 

65.17 

76.25 

57.46 

100.0 

1600 

61.46 

86.00 

49.72 

83.67 

1630 

53.63 

38.50 

69.52 

77.12 

1700 

82.13 

68.50 

85.89 

94.83 

1730 

70.12 

58.65 

66.69 

99.5 

1830 

32.00 

67.95 

71.17 

«0.00 

4.3.1  Analysis 

The  testing  set  results  for  sea  are  significantly  better  at  all  times  than  those  for  land. 
However,  the  main  results  here  are  that  (i)  for  sea,  performance  can  fall  off  as  much  as  15- 
18%  at  an  hour  difference  from  the  training  time,  but  may  be  better  than  that,  (ii)  for  land, 
results  fall  off  from  8-13%  within  a  half  hour  of  the  training  time  and  up  to  60%  at  one  hour 
out. 


The  high  test  results  for  sea  at  1430  and  1500  are  due  to  a  single  class  (clear)  being 
present.  The  sea  test  result  of  83.70  at  1600  results  from  the  fact  that  only  two  classes  are 
present  besides  clear  and  the  method  does  best  for  those  classes.  Thus,  these  initial 
indications  appear  to  suggest  that  a  separate  classifier  would  be  needed  at  half  hour  intervals 
for  land  and  at  one  hour  or  forty-five  minute  intervals  for  sea.  One  explanation  is  that  the 
thermal  mass  of  the  sea  may  cause  slower  variation  in  the  infrared  than  on  land. 

The  possible  finding  that  performance  falls  off  at  half  hour  intervals  for  land  does  not 
indicate  that  the  method  can't  be  scaled  to  operational  use.  Separate  networks  for  each  of  the 
times  could  be  trained.  It  is  quite  possible  that  hours  symmetric  around  Zenith  would  have 
similar  shadows  and  could  be  combined  into  a  single  classifier,  but  that  test  was  not 
performed  here.  Samples  from  nearby  times  could  also  be  mixed  in  the  training  set  to 
produce  a  classifier  that  spans  a  number  of  hours.  While  the  current  dataset  would  allow  this 
test,  it  was  not  performed  as  part  of  this  study. 
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SECTION  5 


DISCUSSION 


The  neural  network  classification  technique  presented  in  this  study  achieved  results 
better  than  those  achieved  by  Garand  for  sea  samples.  The  test  data  used  here  is  from 
completely  unseen  days,  which  strengthens  the  result.  The  fact  that  the  same  performance 
was  reached  without  any  analytic  physical  model  features  being  included  is  of  note.  Limiting 
the  complexity  and  number  of  such  models  can  reduce  the  potential  for  modeling  error  and 
increase  the  portability  of  the  resulting  system. 

Earlier  studies  [1,8]  using  high  resolution  imagery  had  better  results  than  this  study.  It 
is  clear  that  the  current  technique  would  extend  to  higher  resolution  imager)'  and  should 
improve  its  performance  considerably  in  that  case.  However,  it  should  be  clear  that  the 
Bankert  study  [lj  used  data  from  only  four  images:  the  test  data  in  that  study  may  not  have 
been  independent.  The  Lee  study  used  only  3  classes;  the  8  classes  in  the  current  study  made 
for  a  considerably  more  difficult  classification  problem.  As  Lee  does  not  specify  the  dates 
and  times  of  the  imagery  used,  it  is  impossible  to  judge  whether  the  criteria  for  independence 
of  that  study  meet  the  standards  used  here.  Addition  of  physical  models  to  the  current 
approach  may  make  sense  if  increased  resolution  does  not  improve  the  discrimination  of 
certain  cloud  types.  While  the  joint  decision  of  two  analysts  was  used  as  training  data, 
human  error  may  account  for  some  of  the  machine  classification  error. 

The  techniques  described  here  show  promise  for  real  time  cloud  classification4.  The 
representation  used  by  the  approach  is  compact  and  can  summarize  large  volumes  of  training 
data.  Finished  networks  could  be  trained  on  small  additional  amounts  of  local  training  data 
to  tune  them  to  special  local  conditions. 


4  Specialized  hardware  now  exists  which  can  run  these  algorithms  in  real  time  at  a 
reasonable  cost  on  standard  workstations. 
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SECTION  6 


FUTURE  WORK 


The  research  results  presented  here  show  the  method  works  quite  well  with  only  learned 
features  derived  from  neural  network  vision  processing.  There  are  several  ways  to  improve  the 
results.  First,  use  of  higher  resolution  imagery  is  recommended  to  allow  better  discrimination  of 
cloud  types  by  bringing  out  more  textural  details. 

Second,  we  noticed  that  the  meteorological  analysts  tended  to  look  at  the  sequence  of 
clouds  leading  up  to  the  current  hour.  The  existing  database  is  ideally  suited  for  a  study  which 
would  take  samples  of  a  cloud  in  preceding  hours  to  classify  its  type  at  the  current  hour. 

Third,  judicious  choice  of  a  few  models  of  cloud  physics  to  address  specific  discrimination 
problems  may  be  inserted  into  the  neural  network  model  as  done  in  [1]. 

Fourth,  performance  may  be  improved  in  discriminating  between  clear  and  thin  cirrus  by 
inserting  a  zero  frequency  gabor  (i.e.,  a  gaussian)  into  the  set  of  preprocessed  features.  Since 
cirrus  typically  has  little  spatial  frequency  information  associated  with  it,  but  does  have  spatial 
extent,  the  gaussian  may  help  to  integrate  the  weak  luminance  values  and  provide  a  feature  set 
different  from  that  observed  for  clear. 

Last,  larger  sample  sizes  can  probably  be  derived  from  this  database.  Vapnik's  VC 
dimension  theory  shows  that  generalization  performance  can  be  improved  by  increasing  the 
sample  size.  A  larger  training  set  capturing  more  variability  can  be  generated  from  the  current 
database  by  combining  multiple  hours  of  data.  For  example,  samples  from  one  hour  before  and 
one  hour  after  Zenith  can  be  combined  to  form  a  data  set  with  twice  the  number  of  samples.  This 
has  the  potential  for  greatly  improved  generalization  performance  because  of  the  increased 
variability  obtained  by  using  more  independent  samples. 
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Test  Classification  Matrix  Test  Classification  Matrix 
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patterns  66.87%  correct  1000  patterns  75.1%  correct 
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goal  6:  0  51  0  0  140  15  794  0  goal  6:  0  0  0  0  11  109  1078  2 

goal  7:  10  0  61  0  3  0  0  4126  goal  7:  0  0  30  2  0  1  0  4367 
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1400  patterns  90.92%  correct  1000  patterns  92%  correct 
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yoa  1  7:  2  0  4  6  !S  0  0  03917  cjoal  7:  0  1  19  8  0  0  043  69 


1000  patterns  64.6%  correct  1000  patterns  78%  correct 
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goal  6:  6  77  0  16  61  r>S7  483  0  goal  6:  0  138  1  58  128  81  3  262  3 

goal  7:  2  0  18  3  1  0  0  3976  goal  7:  0  2  36  2  0  0  0  4360 
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800  patterns  57.12%  correct  600  patterns  84.17%  correct 
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yoa 1  7:  L  0  1  26  0  0  0  3972  goal  7:  0  039  0  0  0  0  4361 


1200  patterns  76.66%  correct  600  patterns  87%  correct 
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800  patterns  75.5%  correct  200  patterns  100%  correct 
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goal  7:  5087  000  3980 


800  patterns  11. 5%  correct  400  patterns  50%  correct 
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